Search CORE

Multiple testing correction in linear mixed models.

Author: Eskin Eleazar
Han Buhm
Hormozdiari Farhad
Joo Jong Wha J
Publication venue: eScholarship, University of California
Publication date: 01/04/2016
Field of study

BackgroundMultiple hypothesis testing is a major issue in genome-wide association studies (GWAS), which often analyze millions of markers. The permutation test is considered to be the gold standard in multiple testing correction as it accurately takes into account the correlation structure of the genome. Recently, the linear mixed model (LMM) has become the standard practice in GWAS, addressing issues of population structure and insufficient power. However, none of the current multiple testing approaches are applicable to LMM.ResultsWe were able to estimate per-marker thresholds as accurately as the gold standard approach in real and simulated datasets, while reducing the time required from months to hours. We applied our approach to mouse, yeast, and human datasets to demonstrate the accuracy and efficiency of our approach.ConclusionsWe provide an efficient and accurate multiple testing correction approach for linear mixed models. We further provide an intuition about the relationships between per-marker threshold, genetic relatedness, and heritability, based on our observations in real data

ForestPMPlot: A Flexible Tool for Visualizing Heterogeneity Between Studies in Meta-analysis.

Author: Eskin Eleazar
Han Buhm
Kang Eun Yong
Li Xiao
Park Yurang
Segrè Ayellet V
Publication venue: eScholarship, University of California
Publication date: 18/05/2016
Field of study

Meta-analysis has become a popular tool for genetic association studies to combine different genetic studies. A key challenge in meta-analysis is heterogeneity, or the differences in effect sizes between studies. Heterogeneity complicates the interpretation of meta-analyses. In this paper, we describe ForestPMPlot, a flexible visualization tool for analyzing studies included in a meta-analysis. The main feature of the tool is visualizing the differences in the effect sizes of the studies to understand why the studies exhibit heterogeneity for a particular phenotype and locus pair under different conditions. We show the application of this tool to interpret a meta-analysis of 17 mouse studies, and to interpret a multi-tissue eQTL study

Using genomic annotations increases statistical power to detect eGenes.

Author: Duong Dat
Ernst Jason
Eskin Eleazar
Han Buhm
Hormozdiari Farhad
Sul Jae Hoon
Zou Jennifer
Publication venue: eScholarship, University of California
Publication date: 01/06/2016
Field of study

MotivationExpression quantitative trait loci (eQTLs) are genetic variants that affect gene expression. In eQTL studies, one important task is to find eGenes or genes whose expressions are associated with at least one eQTL. The standard statistical method to determine whether a gene is an eGene requires association testing at all nearby variants and the permutation test to correct for multiple testing. The standard method however does not consider genomic annotation of the variants. In practice, variants near gene transcription start sites (TSSs) or certain histone modifications are likely to regulate gene expression. In this article, we introduce a novel eGene detection method that considers this empirical evidence and thereby increases the statistical power.ResultsWe applied our method to the liver Genotype-Tissue Expression (GTEx) data using distance from TSSs, DNase hypersensitivity sites, and six histone modifications as the genomic annotations for the variants. Each of these annotations helped us detected more candidate eGenes. Distance from TSS appears to be the most important annotation; specifically, using this annotation, our method discovered 50% more candidate eGenes than the standard permutation [email protected] or [email protected]

A theory-based practical solution to correct for sex-differential participation bias

Author: Han Buhm
Lee Hanbin
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/06/2022
Field of study

Most genomic cohorts are retrospective where the exposures and outcomes are predetermined prior to sample collection. Therefore, a spurious association between an exposure and an outcome can arise if both variables affect study participation. Such concerns were raised in previous studies questioning the representativeness of the UK Biobank. Recently, a genome-wide association study (GWAS) on biological sex found many autosomal hits and non-negligible autosomal heritability which the authors attribute to selection bias. In this study, we propose a simple and a practical method that can overcome sex-driven selection bias based on theoretical analysis and simulations.This work was supported by the National Research Foundation of Korea (NRF) (Grant number 2022R1A2B5B02001897) funded by the Korean government, Ministry of Science, and ICT. This study was supported by the Creative-Pioneering Researchers Program funded by Seoul National University (SNU)

Effectively Identifying eQTLs from Multiple Tissues by Combining Mixed Model and Meta-analytic Approaches

Author: Choi Ted
Eskin Eleazar
Han Buhm
Sul Jae Hoon
Ye Chun
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/06/2013
Field of study

Gene expression data, in conjunction with information on genetic variants, have enabled studies to identify expression quantitative trait loci (eQTLs) or polymorphic locations in the genome that are associated with expression levels. Moreover, recent technological developments and cost decreases have further enabled studies to collect expression data in multiple tissues. One advantage of multiple tissue datasets is that studies can combine results from different tissues to identify eQTLs more accurately than examining each tissue separately. The idea of aggregating results of multiple tissues is closely related to the idea of meta-analysis which aggregates results of multiple genome-wide association studies to improve the power to detect associations. In principle, meta-analysis methods can be used to combine results from multiple tissues. However, eQTLs may have effects in only a single tissue, in all tissues, or in a subset of tissues with possibly different effect sizes. This heterogeneity in terms of effects across multiple tissues presents a key challenge to detect eQTLs. In this paper, we develop a framework that leverages two popular meta-analysis methods that address effect size heterogeneity to detect eQTLs across multiple tissues. We show by using simulations and multiple tissue data from mouse that our approach detects many eQTLs undetected by traditional eQTL methods. Additionally, our method provides an interpretation framework that accurately predicts whether an eQTL has an effect in a particular tissue

CiteSeerX

Harvard University - DASH

FigShare

Effectively identifying regulatory hotspots while capturing expression heterogeneity in gene expression studies

Author: Eskin Eleazar
Han Buhm
J Joo Jong Wha
Sul Jae Hoon
Ye Chun
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 01/01/2014
Field of study

Expression quantitative trait loci (eQTL) mapping is a tool that can systematically identify genetic variation affecting gene expression. eQTL mapping studies have shown that certain genomic locations, referred to as regulatory hotspots, may affect the expression levels of many genes. Recently, studies have shown that various confounding factors may induce spurious regulatory hotspots. Here, we introduce a novel statistical method that effectively eliminates spurious hotspots while retaining genuine hotspots. Applied to simulated and real datasets, we validate that our method achieves greater sensitivity while retaining low false discovery rates compared to previous methods

Crossref

Harvard University - DASH

Springer - Publisher Connector

Comparison of Two Meta-Analysis Methods: Inverse-Variance-Weighted Average and Weighted Sum of Z-Scores

Author: Buhm Han
Cue Hyunkyu Lee
Ji Sung Lee
Seungho Cook
Publication venue: 'Korea Genome Organization'
Publication date: 01/12/2016
Field of study

The meta-analysis has become a widely used tool for many applications in bioinformatics, including genome-wide association studies. A commonly used approach for meta-analysis is the fixed effects model approach, for which there are two popular methods: the inverse variance-weighted average method and weighted sum of z-scores method. Although previous studies have shown that the two methods perform similarly, their characteristics and their relationship have not been thoroughly investigated. In this paper, we investigate the optimal characteristics of the two methods and show the connection between the two methods. We demonstrate that the each method is optimized for a unique goal, which gives us insight into the optimal weights for the weighted sum of z-scores method. We examine the connection between the two methods both analytically and empirically and show that their resulting statistics become equivalent under certain assumptions. Finally, we apply both methods to the Wellcome Trust Case Control Consortium data and demonstrate that the two methods can give distinct results in certain study designs

Meta-Analysis Identifies Gene-by-Environment Interactions as Demonstrated in a Study of 4,965 Mice

Author: Davis Richard C.
Eskin Eleazar
Furlotte Nicholas
Han Buhm
Joo Jong Wha J.
Kang Eun Yong
Lusis Aldons J.
Shih Diana
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2014
Field of study

Identifying environmentally-specific genetic effects is a key challenge in understanding the structure of complex traits. Model organisms play a crucial role in the identification of such gene-by-environment interactions, as a result of the unique ability to observe genetically similar individuals across multiple distinct environments. Many model organism studies examine the same traits but under varying environmental conditions. For example, knock-out or diet-controlled studies are often used to examine cholesterol in mice. These studies, when examined in aggregate, provide an opportunity to identify genomic loci exhibiting environmentally-dependent effects. However, the straightforward application of traditional methodologies to aggregate separate studies suffers from several problems. First, environmental conditions are often variable and do not fit the standard univariate model for interactions. Additionally, applying a multivariate model results in increased degrees of freedom and low statistical power. In this paper, we jointly analyze multiple studies with varying environmental conditions using a meta-analytic approach based on a random effects model to identify loci involved in gene-by-environment interactions. Our approach is motivated by the observation that methods for discovering gene-by-environment interactions are closely related to random effects models for meta-analysis. We show that interactions can be interpreted as heterogeneity and can be detected without utilizing the traditional uni- or multi-variate approaches for discovery of gene-by-environment interactions. We apply our new method to combine 17 mouse studies containing in aggregate 4,965 distinct animals. We identify 26 significant loci involved in High-density lipoprotein (HDL) cholesterol, many of which are consistent with previous findings. Several of these loci show significant evidence of involvement in gene-by-environment interactions. An additional advantage of our meta-analysis approach is that our combined study has significantly higher power and improved resolution compared to any single study thus explaining the large number of loci discovered in the combined study

Harvard University - DASH